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Abstract. In this paper we study the effect of stochastic errors on two constrained incremental 
sub-gradient algorithms. We view the incremental sub-gradient algorithms as decentralized network 
optimization algorithms as applied to minimize a sum of functions, when each component function 
is known only to a particular agent of a distributed network. We first study the standard cyclic 
incremental sub-gradient algorithm in which the agents form a ring structure and pass the iterate in 
a cycle. We consider the method with stochastic errors in the sub-gradient evaluations and provide 
sufficient conditions on the moments of the stochastic errors that guarantee almost sure convergence 
when a diminishing step-size is used. We also obtain almost sure bounds on the algorithm's per- 
formance when a constant step-size is used. We then consider the Markov randomized incremental 
subgradient method, which is a non-cyclic version of the incremental algorithm where the sequence of 
computing agents is modeled as a time non- homogeneous Markov chain. Such a model is appropriate 
for mobile networks, as the network topology changes across time in these networks. We establish 
the convergence results and error bounds for the Markov randomized method in the presence of 
stochastic errors for diminishing and constant step-sizes, respectively. 

1. Introduction. A problem of recent interest in distributed networks is the 
design of decentralized algorithms to minimize a sum of functions, when each com- 
ponent function is known only to a particular agent [4,9,20,24,29]. Such problems 
arise in many network applications, including in-network estimation, learning, signal 
processing and resource allocation [6,16,28-31]. In these applications, there is no 
central coordinator that has access to all the information and, thus, decentralized 
algorithms arc needed to solve the problems. In this paper, we consider decentralized 
subgradient methods for constrained minimization of a sum of convex functions, when 
each component function is only known partially (with stochastic errors) to a specific 
network agent. We study two incremental subgradient methods with stochastic errors: 
a cyclic and a (non-cyclic) Markov randomized incremental method. 

The cyclic incremental algorithm is a decentralized method in which the network 
agents form a ring and process the information cyclically. The incremental method was 
originally proposed by Kibardin [11] and has been extensively studied more recently 
in [4,7,17,20,33]. Incremental gradient algorithms were first used for optimizing the 
weights in neural network training [7, 17,33], and most of the associated literature 
deals with diffcrentiable non-convex unconstrained optimization problems [3,4,7,33, 
34] . The incremental subgradient algorithm for non-diffcrcntiable constrained convex 
optimization has been investigated in [19,20] without errors, and in [13,21,29,34] where 
the effects of deterministic errors are considered. The algorithm that we consider in 
this paper is stochastic and as such differs from the existing literature. For this 
algorithm, we establish convergence for diminishing step-size and provide an error 
bound for constant step-size. 

The Markov randomized incremental algorithm is a decentralized method where 
the iterates are generated incrementally within the network by passing them from 
agent to agent. Unlike the cyclic incremental method, where the agent network has 
a ring structure and the information flow is along this ring (cycle), in the Markov 
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randomized incremental method, the network can have arbitrary structure. However, 
similar to cyclic incremental method, in the Markov randomized incremental method, 
only one agent updates at any given time. In particular, an agent in the network 
updates the current iterate (by processing locally its own objective function) and 
either, passes the new iterate to a randomly chosen neighbor, or, processes it again. 
Thus the order in which the agents update the iterates is random. This class of 
incremental algorithms was first proposed in [20], where the agent that receives the 
iterate is chosen with uniform probability in each iteration (corresponding to the case 
of a fully connected agent network). Recently, this idea has been extended in [9] 
to the case where the sequence in which the agents process the information is a 
time homogeneous Markov chain. The rationale behind this model is that the agent 
updating the information at a given time is more likely to pass this information to 
a close neighbor rather than to an agent who is further away. In this paper, we 
consider a more general framework than that of [9] by allowing the sequence in which 
the agents process the information to be a time non-homogeneous Markov chain^ 
We prove the algorithm convergence for diminishing step-size and establish an error 
bound for a constant step-size. This extends the results in [9], where an error bound 
for a homogeneous Markov randomized incremental subgradient method is discussed 
for a constant step-size and error-free case. 

The Markov randomized incremental algorithm is also related to the decentralized 
computation model in [2, 38] for stochastic optimization problems. However, the 
emphasis in these studies is on parallel processing where each agent completely knows 
the entire objective function to be minimized. More closely related is the work in 
studies in [22] that develops a "parallel" version of the unconstrained incremental 
subgradient algorithm. Also related is the constrained consensus problem studied 
in [25] where agents are interested in obtaining a solution to a feasibility problem, 
when different parts of the problem arc known to different agents. At a much broader 
scale, the paper is also related to the literature on distributed averaging and consensus 
algorithms [2, 8, 10, 22, 25, 26, 35, 36, 38, 39]. 

Our main contributions in this paper are the development and analysis of the 
Markov randomized incremental method with stochastic subgradients and the use of 
a time non-homogeneous Markov model for the sequence of computing agents. In 
addition, To the best of our knowledge, this is among the few attempts made at 
studying the effects of stochastic errors on the performance of decentralized optimiza- 
tion algorithms. The other studies are [15,36,37], but the algorithms considered are 
fundamentally different from the incremental algorithms studied in this papci'0 

The paper is organized as follows. In Section [2J we formulate the problem of 
interest, and introduce the cyclic incremental and Markov randomized incremental 
method with stochastic errors. We also discuss some applications that motivate our 
interest in these methods. In Section O we analyze convergence properties of the 
cyclic incremental method. We establish convergence of the method under diminishing 
step-size and provide an error bound for the method with a constant step-size, both 
valid with probability 1. We establish analogous results for the Markov randomized 
incremental method in Section [U We give some concluding remarks in Section [5l 



1 The primary motivation to study such a model are mobile networks where the network connec- 
tivity structure is changing in time and, thus, the set of the neighbors of an agent is time-varying. 

2 In that work, the components of the decision vector are distributed while the objective function 
is known to all agents. In contrast, in this paper, the objective function data is distributed, while 
each agent has an estimate of the entire decision vector. 
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2. Problem Formulation and Motivation. Wc consider a network of m 
agents, indexed by i = l,...,m. The network objective is to solve the following 
problem: 

minimize f(x) = Y%Li M x ) / 2 1) 

subject to x G X, 

where x S 5R™ is a decision or a parameter vector, X is a closed and convex subset 
of 5R", and each fa is a convex function from 3?" to 5ft that is known only to agent 
i. Problems with the above structure arise in the context of estimation in sensor 
networks [29,30], where x is an unknown parameter to be estimated and /j is the 
cost function that is determined by the i-th sensor's observations (for example, fa 
could be the log-likelihood function in maximum likelihood estimation). Furthermore, 
problems with such structure also arise in resource allocation in data networks. In 
this context, x is the resource vector to be allocated among to agents and /{ is the 
utility function for agent i [6]. We discuss these in more detail later. 

To solve the problem (|2.1|) in a network where agents are connected in a directed 
ring structure, we consider the cyclic incremental subgradient method [20]. Time is 
slotted and in each time slot, the estimate is passed by an agent to the next agent 
along the ring. In particular, agent i, receives the iterate from agent i — 1, and 
updates the received estimate using a subgradient of its "own objective function /j" . 
The updated iterate is then communicated to the next agent in the cycle, which is 
agent i + 1 when i < to and agent 1 when i = m. We are interested in the case where 
the agent subgradient evaluations have random errors. Formally, the algorithm is 
given as follows: 

za,k+i = Z m ,k = Xk, ^2 2) 

Zi,k+i = Vx [zi-x,k+x - ctk+i (Vfi(zi-i t k+i) + , 

where the initial iterate xq € X is chosen at random. The vector Xk is the estimate at 
the end of cycle k, z^fc+i is the intermediate estimate obtained after agent i updates 
in k+ 1-st cycle, V/i(x) is the subgradient of fa evaluated at x, and e^fc+i is a random 
error. The scalar ak+i is a positive step-size and Vx denotes Euclidean projection 
onto the set X. We study the convergence properties of method (|2.2|) in Section [3] for 
diminishing and constant step-size. 

In addition, for a network of agents with arbitrary connectivity, we consider an 
incremental algorithm where the agent that updates is selected randomly according 
to a distribution depending on the agent that performed the most recent update. 
Formally, in this method the iterates arc generated according to the following rule: 

Xk+i = V x [xk - a k+1 (V/ s ( fc+ i)(xfc) + e s (fc+i),fc+i)] , (2.3) 

where the initial iterate xq € X is chosen at random and the agent s(0) that initializes 
the method is also selected at random. The integer s(k + 1) is the index of the agent 
that performs the update at time k + 1, and the sequence {s(k)} is modeled as a time 
non-homogeneous Markov chain with state space {1, . . . , to}. In particular, if agent i 
was processing at time k, then the agent j will be selected to perform the update at 
time k+1 with probability [P(k)]ij. Formally, we have 

Prob {s(k + I) = j | s(k) = ^} = [P(fc)] <l3 -. 

When there are no errors (e s (fc+i),fc+i = 0) and the probabilities [P{k)]ij are all equal 
to — , the method in (|2.3p coincides with the incremental method with randomization 
that was proposed and studied in [20]. 
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Following [9], wc refer to the method in (|2.3p as the Markov randomized incre- 
mental stochastic algorithm. Wc analyze convergence properties of this method in 
Section [3] for diminishing and constant step-sizes. 

2.1. Motivation. As mentioned, we study the convergence properties of the 
incremental algorithms (|2.2[) and (|2.3p for diminishing and constant step-size, and 
for zero and non-zero mean errors. Such errors may arise directly as computational 
round-off errors, which are of interest when the entire network is on a single chip and 
each agent is a processor on the chip [18]. In addition, stochastic errors also arise in 
the following context. 

Let the function fi(x) have the following form 

fi(x) = E[gi(x,Ri)] , 

where E[-] denotes the expectation, Ri £ is a random vector and gi : W LXd — > SR. 
Agent i does not know the statistics of Ri, and thus does not know its complete 
objective function fi. However, agent i sequentially observes independent samples 
of Ri and uses these samples to determine an approximate sub-gradient using the 
Robbins-Monro approximation [32] or Kiefcr-Wolfowitz approximation [12]. These 
approximate sub-gradients can be considered to be the actual sub-gradient corrupted 
by stochastic errors. 

We next discuss some specific problems that fall within the framework that we 
consider and can be solved using the proposed methods. 

Distributed Regression. Consider to sensors that sense a time invariant spatial 
field. Let r,; & be the measurement made by i th sensor in time slot k. Let s,; be the 
location of the i th sensor. Let h(s; x) be a set of candidate models for the spatial field 
that are selected based on a priori information and parameterized by x. Thus, for each 
x, the candidate h(s, x) is a model for the spatial field and h(si, x) is a model for the 
measurement r^fc. The problem in regression is to choose the best model among the 
set of candidate models based on the collected measurements r^k, i.e., to determine 
the value for x that best describes the spatial field. In least square regression, the 
parameter value x* corresponding to the best model satisfies the following relation: 

N 

2 



G Argmin lim V — Y] (n. k - h(s t ,x)) 



When the measurements r^fc are corrupted by i.i.d. noise, then the preceding relation 
is equivalent to the following 



m 

x* G Argminy^ E (Ri - h(si,x)) 2 



In linear least squares regression, the models h(si, x), i = 1, . . . , to, are linear in x, so 
that each of the functions fi(x) = E (Ri — h(si,x)) 2 is convex in x. 

Distributed Resource Allocation. An important problem in wireless net- 
works is the fair rate allocation problem [6] . Consider a wireless network represented 
by a graph with a set of directed (communication) links. Suppose that there are to 
flows indexed 1, . . . , to, whose rate can be adjusted, and let Xi be the rate of the i th 
flow. Each flow is characterized by a source node b(i) and a destination node e(i). 
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The rate vector x must satisfy some constraints that arc imposed by the individual 
link capacities of the network. For example, if there arc multiple flows (or, parts of 
the flows) that use a link of total capacity c then the total sum of the rates of the 
flow routed through that link must not be greater than c. Apart from this there could 
also be some queuing delay constraints. Thus, only flow vectors that arc constrained 
to a set X can be routed through the network. Associated with each flow, there is 
a reward function Ui(xi) depending only on the flow rate x% and known only at the 
source node b(i). The reward function is typically a concave and increasing function. 
In the fair rate allocation problem, the source nodes {b(£)} need to determine the 
optimal flow rate x* that maximizes the total network utility. Mathematically, the 
problem is to determine x* such that 

m 

x* £ ArgmaxV^ Ui(xi). 
x ex ^ 

In some networks, the same flow can communicate different types of traffic that has 
different importance in different time slots. For example, in an intruder detection 
network, a "detected" message is more important (and is rewarded/weighted more) 
than a "not detected" message or some other system message. Thus, the reward 
function is also a function of the contents of the flow: if the type of flow i in time slot 
k is r^fc, where r^fc takes values from the set of all possible types of flow data, then 
the reward is Ui(xi,ri t k) at time k. When the type of traffic on each flow across slots 
is i.i.d, the fair allocation rate problem can be written as 

m 

maxV E[Ui(xi,Ri)} . 

xEX ^— ' 

i=l 

The statistics of Ri may not be known since they may depend upon external factors 
such as the frequency of intruders in an intruder detection network. 

2.2. Notation and Basics. We view vectors as columns. We write x T y to 
denote the inner product of two vectors x and y. We use || • || to denote the standard 
Euclidean norm. For a vector x, we use Xi to denote its i-th component. For a matrix 
A, we use [A]ij to denote its (i,j)-th entry, [A]i its i-th row and [A]i its j-th column. 
We use e to denote a vector with each entry equal to 1 . 

We use /* to denote the optimal value of the problem (|2.ip . and we use X* to 
denote its optimal set. Throughout the paper, we assume that the optimal value /* 
is finite. 

In our analysis, we use the subgradient defining property. Specifically, for a convex 
function / : 5ft™ — ► 5ft, the vector V/(x) is a subgradient of f at x when the following 
relation is satisfied: 

Vf(x) T (y-x) <f(y)-f(x) for all ye 5ft™ (2.4) 

(see, for e.g., [1]). 

3. Cyclic Incremental Subgradient Algorithm. Recall, that the cyclic in- 
cremental stochastic subgradient algorithm is given by 

z 0,k+l = Zm,fc = Xk, jn 

z%,k+i = [zi-i,k+i - "fe+i (V/i(«i_i,fc+i) + e;,fc+i)] , 
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where x G X is a random initial vector, V fi(x) is a subgradient of /, at x, ei.k+i is 
a random noise vector and a^+i > is a step-size. 

The main difficulty in the study of the incremental stochastic subgradient algo- 
rithm is that the expected direction in which the iterate is adjusted in each sub- 
iteration is not necessarily a subgradient of the objective function /. For this reason, 
we cannot directly apply the classic stochastic approximation convergence results 
of [5,14,27] to study the convergence of method in (|2.2p . The key relation in our 
analysis is provided in Lemma 13.11 in Section 13.11 Using this lemma and a stan- 
dard super-martingale convergence result, we obtain results for diminishing step-size 
in Theorem 13.31 Furthermore, by considering a related "stopped" process to which 
we apply a standard supermartingalc convergence result, we obtain the error bound 
results for a constant step-size in Theorems 13.41 and 13.51 

We make the following basic assumptions on the set X and the functions fa. 

Assumption 1. The set X CJ}™ is closed and convex. The function f t : 5ft™ — > 5ft 
is convex for each i G {1, . . . , m}. 

In our analysis, we assume that the first and the second moments of the sub- 
gradient noise e^fc are bounded uniformly over the agents, conditionally on the past 
realizations. In particular, we define F£ as the tr-algebra generated by xq and the 
subgradient errors e^i, . . . , e^k, and assume the following. 

Assumption 2. There exist deterministic scalar sequences {/ife} and {vk} such 

that 

||E[ei,fc | FJT 1 ] || < Li k for all i and k, 

E [ikfcf i p t l ] < v i f° r aU i and k - 

Assumption [2] holds for example, when the errors e^fc are independent across both 
i and k 7 and have finite moments. Note that under the assumption that the second 
moments are bounded, from Jensen's inequality we readily have 

||E[e ilfe | < v/EpMPlFr 1 ] <vu- (3.2) 

However, for a constant step-size, the terms || E [e^,*; | -F^ -1 ] || and E [||e^ f fe || 2 | FjT 1 ] 
affect the error bounds on the performance of the incremental method differently, 
as seen in Section 13.31 For this reason, we prefer to use different upper-bounds for 
the terms ||E[e ljfc | F^ 1 ] || and E[||e ljfc || 2 | F^ 1 ] . We will also, without any loss of 
generality, assume that /Ufe < v^. 

We also assume that the subgradients V/;(x) are uniformly bounded over the 
set X for each i. This assumption is commonly used in the convergence analysis of 
subgradient methods with a diminishing or a constant step-size. 

Assumption 3. For every i, the subgradient set of the function ft at x G X is 
nonempty and uniformly bounded over the set X by a constant Ci, i.e., 

||V/i(a;)|| < Ci for all subgradients V fi(x) and for all x G X. 

Assumption [5] holds for example, when each fi is a polyhedral function or when 
the set X is compact. 
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3.1. Preliminaries. In this section, we provide a lemma establishing a basic 
relation for the iterates generated by the incremental method (|3. 1|) and any step-size 
rule. This relation plays a key role in our subsequent development. 

Lemma 3.1. Let Assumptions [7J and\^hold. Then, the iterates generated by 
algorithm i3.1\) are such that for any step-size rule and for any y £ X, 



E[||4+i 



<||d fc (2/)|| 2 -2a fc+1 (f(x k )-f(y)) 



(3.3) 



w/iere d k (y) — x k - y and dt,fc+i(&/) = 2i,k+i - y for all k. 

Proof. Using the iterate update rule in (|3.1[) and the non-expansive property of 
the Euclidean projection, we obtain for any y £ A, 

\\di,k+i(y)\\ 2 = \\P X [zi-i.k+i - Q!fc+iV/i(zt-i,fc+i) - afc+iei,fc+i] - y\\ 2 
< ||^-i,fc+i - ajfc+iV/i(2i_i,k+i) - Qtfc+ie^fc+i - y|| 2 
=\\di-i,k+i(y)\\ 2 - 2a fc+ id ? _i ife+ i(y) T V/i(z i _i jfc+ i) 

- 2a fc+ idj_i ifc+ i(y) T £j !fc+ i + al +1 \\e iik + V/i(zj_i,fe+i)|| . 

Taking conditional expectations with respect to the a- field FJ^^, we further obtain 

E[||d lifc+1 (y)|| 2 | F l k -\\ <||d i _ 1 , fc+1 (j/)|| 2 -2a k+1 d l ^ k+1 (y) T Vf l (z l _ w ) 



- 2a fc+ id^i^+i(y) T E[e ijfc+ i | 
+ a 2 +1 E[|| £lifc+1 + V/ t (z 4 _ 1:fc+1 )|| 2 | F t 



k+l 



(3.4) 



We now estimate the last two terms in the right hand side of the preceding 
equation by using Assumption [5] on the error moments. In particular, we have for all 
i and k, 

-di-i,k+i(y) T E\e itk+1 | Ffe+j] < \\di- ltk+1 (y)\\ I E^+i | -^fe+x] I < IJ-k+i\\di-i,k+i(y)\\- 

Next, we estimate the last term in Eq. (|3.4[) by using Assumption [2] on the error 
moments in and Assumption [3] on the subgradient norms. We have all i and k, 



|e*,fc+i + Vfi(z 



i-l.fc+l J 



IKfc+iir \K+l\ + liv/i^-x.fc+x) 

+ 2V/ 1 (z l _ 1 , fc+1 ) T E[e l , fc+1 
=E[|k fc+1 || 2 + ||V/,(*_ llfc+ i) 

+ 2||V/i(z i _ 1 , fe+ i)||||E[ei, fe+1 |^] 
<(i/ fc+ i + Q) 2 , 



where in the last inequality we use E 



for all k [cf. Eq. 



kfell 2 !^- 1 



< v 2 k and ||E[e ijfe | i^" 1 ] || < v k 



Combining the preceding two relations and the inequality in 
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(|3.4p , wc obtain for all y G X, 

E[|K fe+1 (y)|| 2 | Ffo\] <||di-i,fe + i(j/)|| 2 -2a fc+ id,_i, fc+ i( 2 /) T V/,(z l _i, fc+ i) 



+ 2afc+i^fc+i||di_i j fe + i(y)|| + a 2 k+1 (v k +i + C t ) 2 . (3.5) 

We now estimate the second term in the right hand side of the preceding relation. 
From the subgradient inequality in (|2.4p we have 

- di_i ifc+ i(j/) T V/i(2:i-i.fc + i) = — (zi-i,fc+i — y) T V/i(.Zi-i.fc+i) 

< - (fi(zi-i,k+i) - fi(y)) 

= - (fi(xk) - My)) - (/iOi_i,fc+i) - fi{x k )) 

< - {M<Ck) - My)) ~ (V/i(*k)) T (*-i,k+i " *k) (3-6) 



< - (/»(a;fc) - Mv)) + d [|%-i,fc+i - 



(3.7) 



In (|3.6p we have again used the subgradient inequality (|2.4[) to bound /i(zi-i,fc+i) — 
fi(xk), while in (|3.7p we have used the subgradient norm bound from Assumption [3J 
We next consider the term fc+i — Xk\\. From (|3.1|) we have 



i-l 

^(zj.fc+i -Zj_ 1)fc+1 ) 

3=1 



i-l 
3 = 1 



By the non-expansive property of the projection, we further have 

i-l i-l 

||*-i,k+i - acjfcll < «fc+i 51 (HV/i(«i-i,fc+i)|| + ll^fc+ill) < «fc+i + ll e j,fe+ill) • 

(3.8) 

By combining the preceding relation with Eq. (|3.7[) . we have 



-d l _ life+1 (y) T V/ l (z J _ 1 , fc+1 ) < - (fi(x k ) - /,(y)) + a fc+ iCi £ (Q + ll^fc+ill) • 
By substituting the preceding estimate in the inequality in (|3.5p . wc obtain for all 

y e*, 

E[||d, fe+1 (j/)|| 2 | Fl~l] <]K-_ llfe+ i(y)|| 2 - 2« fe+1 (/.-(a:*) - / 4 (y)) 

i-l 

+ 2a 2 +i a^(Q + ||e^ +1 ||) 

+ 2a k+ ifi k+ i\\d l -i,k+i{y)\\ + a 2 k+1 (Ci + is k+ i) 2 ■ 
Taking the expectation conditional on Fjp', we obtain 

E[\\d hk+1 (y)\\ 2 | Fl n ] <E[jK-_ life+1 (y)|| 2 | PTl ~ 2a fc+1 (/.-(a*) - My)) 



+ 2afc+i/Zk+iE[||d i _i ifc+ i(y)|| | Fj! 1 ] 

i-l 

+ ffc+i) + afe+i(Ci + ^fc+i) 2 

j'=i 
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where we have used Assumption 2 and Jensen's inequality to bound E[||ej || | FJJ 1 ] 
by Vk+i [cf- Eq. (|3.2[) ]. Summing over i — 1, . . . , to, and noting that dQ : k+i(y) = ^h~Vi 
we see that 

E[\\d k+1 (y)\\ 2 I <\\dk(y)\\ 2 2a k+1 (/(**) - f(y)) 

i n 

+ 2a k+Wk+1 E[||di_ 1>fc+1 (y)|| | FJ?] 
i=l 

m i—1 m 

+ 2a 2 k+1 Q ; £ (°o + »k+i) + E 

ot\+\{Ci + vk+if- 

i=l j=l i=l 

Finally, by noting that 

m i—1 m / tci 

2 E c * E + + E(^ + ^+i) 2 = E c * 

z— 1 i—1 i—1 \ i—1 

we obtain for all y £ X, and all ? and 

E [||4 + i (J/) || 2 I *T] <||4(j/)|| 2 - 2a k+1 (f(x k ) f(y)) 

m 

+ 2a fe+ i/x fc+1 ^E[||^_ ljfc+1 ( 2 /)|| | FH 

(m 
E Ci + m ^-+i 
i=i 

□ 

3.2. Convergence for diminishing step-size. We here study the convergence 
of the method in (|3.1[) for diminishing step-size rule. In our analysis, we use the 
following result due to Robbins and Siegmund (see Lemma 11, Chapter 2.2, [27]). 

Lemma 3.2. Let (f2,.7 r , T 3 ) be a probability space and let Tq C T\ C ... be a 
sequence of sub a-fields of T . Let u kl v k and w k , k = 0,1,2 ... , be non-negative T k - 
measurable random variables and let {q k } be a deterministic sequence. Assume that 
Y^kLolk < oo, and J2k=o w k < 00 and 

E[u fe+ i | Tu\ < (1 + qk)u k -v k + Wk 

hold with probability 1. Then, with probability 1, the sequence {u k } converges to a 
non-negative random variable and YI^—q Vk < oo. 

We next provide a convergence result for diminishing step-sizes. 

Theorem 3.3. Let Assumptions [7J fj| and [3] hold. Assume that the step-size se- 
quence {ctk} is positive and such that J^kLi a k — oo and X)fe°=i a k < 00 ■ ^ n addition, 
assume that the bounds fx k and v k on the moments of the error sequence {ei,fe}, are 
such that 

oo oo 
k=l k=l 

Also, assume that the optimal set X* is nonempty. Then, the iterate sequence {xk} 
generated by the method &3. 1}) converges to an optimal solution with probability 1. 
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Proof. First note that all the assumptions of Lemma 13.11 arc satisfied. Let x* be 
an arbitrary point in X* . By letting y = x* in Lemma 13.11 we obtain for any x* £ X* , 

E[|K.+i(z*)|| 2 | *T] <\\dk(xn\\ 2 - 2a k+1 (f(x k ) - f ) 

m 

+ 2a fe+1 /i fc+1 ^E[||d ? _ 1 , fc+1 (.T*)|| | F?] 



i=l 



/ m \ 

+ <4+i mv k+1 +Y^ c i 



(3.9) 



We relate ||dj_i j fc+i(a;*)|| to ||dfc(x*)|| by using the triangle inequality of norms, 
\\di-i ik+ i(x*)\\ = ||zi_i jfc+ i - x k +x k - x*\\ < ||zi_i jfc+ i - x k \\ + \\d k (x*)\\. 
Substituting for 1 1 i .fc+ i — x k \\ from (|3. 8|) we obtain 

i-l 

\\d t - hk+1 (x*)\\ < a k+1 £ (Cj + Hk+i\\) + \\dk(x*)l 

3=1 

Taking conditional expectations, we further obtain 

i-l 

E[\\di-i,k+i(x*)\\ I FJT] <\\d k (x*)\\+a k+1 Y / (C ] +v k+1 ) 7 

3=1 

where we have used Assumption [5] and Jensen's inequality to bound E[||ejj c+ i|| | FJJ 1 
by v k +i- Using the preceding inequality in (|3.9[) . we have 

E[\\d k+1 {x*)\\ 2 | FJT] <\\d k (x*)\\ 2 - 2a k+1 (f(x k ) - /*) 

rn i—1 

+ 2ma k+lh i k+1 \\d k (x*)\\ + 2a| +1 /i fc+ i ^ ( C i + "k+l) 



/ m 

y 2 k+1 ( mv k+1 + Yl Cl 



i=i i=i 

2 



Next, using the inequality 

2\\d k {x*)\\ <i + \\d k (x*)\\ 2 , 

we obtain 

E[\\d k+1 (x*)\\ 2 | FT] < (1 + ma k+lh i k+1 ) \\d k (x*)\\ 2 - 2a k+1 (f(x k ) - /*) 

m i— 1 

+ ma k+ i[i k+ i +2a 2 k+lh i k+ i^^(C j + v k+ i) 

i=l j=l 

+ a 2 k+1 mv k+ i + J2c}\ ■ (3.10) 
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By the assumptions on the step-size, and the sequences {/ife} and {v k }, we further 
have 



k=0 

oo m i—1 oo m i—1 

^2^ +1 /i w ^^(C J +vk+i) < 2j2J2J2(al +1 (ik+iC j +a 2 k+1 vl +1 ) < oo, 

k=0 i=l j=l k=0 i=l j=l 

oo I m \ 2 oo / / m \ I 

E "i+i + e Ci ) < 2 £ m2 ^ 2 +i + ( £ Ci < °°' 

k=0 V i=l / fe=0 \ \i=l / / 

where in the second relation above, we have used Hk+i ^fc+i [cf. Eq. 1)3.2]) ]. while in 
the last inequality, we have used (a + b) 2 < 2(a 2 + b 2 ) valid for any scalars a and b. 
Thus, the conditions of Lemma [3.21 are satisfied with u k = \\dk(x*)\\ 2 , J~k = F™' 
q k = mak+ifik+u v k = 2a k+ i (f{x k ) - /*) and 

m i — 1 / m \ 

w fc = ma H i/i H1 + 2a 2 k+lh i k+1 ^ E + + a fc+i mv k+i +J2 Cl ) ■ 

i=i j=i \ i=i / 

Therefore, with probability 1, the scalar |jdfc + i(a;*)|| 2 converges to some non-negative 
random variable for every x* G X*. Also with probability 1, we have 



E 

k=0 



«fc+l if{ x k) - /*) < oo. 



Since X)fc=i a fe = oo, ^ follows that lim inffc—xx, f(xk) = f* with probability 1. By 
considering a sample path for which liminffe_ 00 f(x k ) = f* and ||dfc+i(a;*)|| 2 converges 
for any x* 1 we conclude that the sample sequence must converge to some x* G X* 
in view of continuity of /. Hence, the sequence {x k } converges to some vector in X* 
with probability 1. □ 

Note that under assumptions of Theorem l3.3i it can be seen that E dist (xk, X*) 2 
also converges to 0. In particular, since the solution set X* is closed and convex, there 
exists a point x* k £ X* that is closest to x k for every k. Letting y = x k in relation 
(|3.10p and using the fact that dist (xk+i,X*) < d k +\ (x k ) with probability 1, we obtain 
for all k, 



d\st{x k+1 ,X*) 2 | F™ < (1 + m.a k+U i k+1 )d\st{x k ,X*y - 2a k+1 (f(x k ) - f ) 

m i—1 

+ ma k+1 n k+ i + 2a 2 k+1 fi k+1 ^ E ( C i + Vk >+i) 



.,2 



i=l j=l 

2 



+ a l+i muk+i + E °i 
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Taking expectations, we obtain for all k, 



E 



dist I 



{x k+ i,X*f < (1 + ma k+1 fi k+ i) E dist (x k , X*) 2 ] - 2a k+x (f(xk) - /*) 



m i—1 
,2 \ ' \ ' 



+ ma k+ in k+ i + 2a k+1 n k+ i ^ 2^ (C,- + v k+1 ) 

i=i j=i 

From the deterministic analog of Lemma T3.21 we can argue that E dist (x k +i, X*) 2 
converges and liminffc^oo E[f(x k )] = f*. Since {x k } converges to a point in X* with 



probability 1, it follows that E dist (xjt+i, X*) 



converges to 0. 



3.3. Error bound for constant step-size. Here, we study the behavior of the 
iterates {x k } generated by the method (|3.ip with a constant step-size rule, i.e., a k = a 
for all k. In this case, we cannot guarantee the convergence of the iterates, however, 
we can provide bounds on the performance of the algorithm. In the following lemma, 
we provide an error bound for the expected values E[f(x k )] and a bound for inffc f{x k ) 
that holds with probability 1. The proofs of these results are similar to those used 
in [19]. 

Theorem 3.4. Let Assumptions^ and&hold. Let the sequence {x k } be generated 
by the method h3.1\) with a constant step-size rule, i.e., a k — a for all k > 1. Also, 
assume that the set X is bounded, and 

H = sup/^fc < oo, v = sup^fe < oo. 
fe>i fc>i 

We then have 

liminfE[/(ar fc )] + max \\x-y\\ + ^ lf]Ci + miA , (3.11) 

\i=l / 

and with probability 1, 

inf f(x k ) < f* + m/i max \\x -y\\ + — I Q +mv ) . (3.12) 

fe>0 x,y£X 2 I ~ — • I 

Proof. Since X is compact and each is convex over K ra , the subgradicnts of fi 
are bounded over X for each i. Thus, all the assumptions of Lemma I3TT1 are satisfied. 
Furthermore, the optimal set X* is non-empty. Since [i k < [i and v k < v for all k, 
and ||di_i i fc+i(y)|| < max 1:!)e x \\x — y\\, according to the relation of Lemma [3.1[ we 
have for y = x* el*, 

E[||d fe+1 (x*)|| 2 | F fe m ] <\\d k (x*)\\ 2 - 2a (f(x k ) - /*) + 2mo^max||x - v || 

x,y 

(3.13) 
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By taking the total expectation, we obtain for all y £ X and all fc, 

E[||c4 +1 (:r*)|| 2 ] <E[\\d k {x*)\\ 2 } - 2a(E[/(x fc )] - /*) + 2m afl max \\x - y\\ 



a 2 



5> 



Now, assume that the relation (|3 . 1 1 1) does not hold. Then there will exist a 7 > 
and an index fc 7 such that for all fc > fc 7 , 



E[/(x fc )] > /* + 7 + mil max ||z - y\\ + - V C, 

2 ^ 

Therefore, for fc > fc 7 , we have 



/m \ 2 

a 

me 



V 

mi/ 



E[||d fe+ i(^)l| 2 ] <E[||d fc (x*)|| 2 ] -2a 7 + m/i max ]|x - y\\ + f [ V C< 

+ 2ma/i max |[x — + a 2 IN Cj + ?tw 

<E[||4(x*)|| 2 ] -2a 7 . 
Hence, for fc > fc 7 , 

E[||4+i(x*)|| 2 ] < E[||4 7 (x*)|| 2 ] -2 7 a(fc-fc 7 ). 

For sufficiently large fc, the right hand side of the preceding relation is negative, 
yielding a contradiction. Thus the relation (|3 . must hold. 
We now prove the relation in (|3.12[) . Define the set 

Ln = \ x e X : f(x) < f* + ^ + mn max ||a: -y\\+% \ Y\Ct 

iv x,y£X I \ -f— ^ 

Let x* £ X* and define the sequence Xk as follows: 



mi/ 



Xfc+i 



x k +i if x fc ^ Ljv, 
x* if ifc € L». 



Thus, the process is identical to the process {xk}, until {xk} enters the set L^. 
Define 

dk(y) =xk-y- 

Let us first consider the case when Xk £ ijv- Since Xk = x* and ifc+i = x*, we have 
<4(x*) = and dk+i{x*) = 0, yielding 

||d fc+ i(x*)|| 2 |FH = 4(0- (3.14) 
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When x k Ln, x k = x k and x k+ \ = Using relation (|3. 1 3[> . we conclude that 

4+1 (Of I K\ <\\d k {x*)f - 2a (f(x k ) - f(x*)) + 2ma f i max \\x - y\\ 



a- I V Ct 



2 , 





Observe that when x k ^ Ln, 

f(x k ) - f* >-rz+mn max \\x 

Therefore, by combining the preceding two relations, we obtain for x k ^ Ln, 

r - i On 

E[\\d k+1 (x*)\\ 2 \ FIT] <\\ dk (x*)\\ 2 -—. (3.15) 

Therefore, from (|3.14|) and (|3.15|) . we can write 

E[||d fc+1 (x*)|| 2 |JTl <\\d k (x*)\\ 2 -A k+1 , (3.16) 



where 

A fc+ i = 



if Xk S Ljv, 
if i fc £ L N . 



Observe that (|3.16p satisfies the conditions of Lemma 13.21 with u k = \\d k (x*)\\ 2 , T k = 
F k n , qk = 0, v k = Afe_|_i and w k — 0. Therefore, it follows that with probability 1, 

oo 

A/c+i < oo. 

fe=0 

However, this is possible only if = for all k sufficiently large. Therefore, with 
probability 1, we have x k £ Ljv for all sufficiently large k. By letting N —>■ oo, we 
obtain (f3~12|) . □ 

As seen from relation (|3.12p of Theorem 13. 4[ the error bound on the "best func- 
tion" value inffc f(x k ) depends on the step-size a, and the bounds [i and v for the mo- 
ments of the subgradient errors e,^. When the errors €i_ k have zero mean, the results 
of Theorem 13.41 hold with /j, = 0. The resulting error bound is |f {YX= l ^ + mv) 2 , 
which can be controlled with the step-size a. However, this result also holds when the 
boundedness of X is relaxed by requiring subgradient boundedness instead, as seen 
in the following theorem. The proof of this theorem is similar to that of Theorem l3.4( 
with some extra details to account for the possibility that the optimal set X* may be 
empty. 

Theorem 3.5. Let Assumptions [7J [H and [3] hold. Let the sequence {x k } be 
generated by the method h3.1\) with a constant step-size rule, i.e., a k = a for all 
k G N. Also, assume that the subgradient errors have zero mean and bounded 
second moments, i.e., 

A*fe = for all k > 1, v = sup^ < oo. 

k>l 
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We then have 



/ \ 2 

/ rn \ z 

a 



liminfE[/(x fc )] </* + - \Y,Ci+mv\ , (3.17) 



k— >oo 2 

and with probability 1, 



I m \ ■* 

a 



Mf( Xk )<f + -\J2Ci+mvj . (3.18) 

Proof. All the assumptions of Lemma |3. II are satisfied. Since \x k = and v k <v 
for all k, according to the relation of Lemma 13-11 we have for any y £ X, 

E[\\d k+1 (y)\\ 2 \ F^] <\\d k (y)\\ 2 ~2a(f(x k )-f(y)) + a 2 (f^Ci + mi^ . (3.19) 

By taking the total expectation, we obtain for all y £ X and all k, 

E[\\d k+1 (y)\\ 2 } <E[\\d k (y)\\ 2 ] - 2a (E[f(x k )] - f(y)) + a 2 (f> + . (3.20) 

Assume now that the relation (|3.17p is not valid. Then there will exist a 7 > and 
an index /c 7 such that for all k > k y , 

E[f(x k )} >/* + 2 7 +f (jtci + rnv) . 
Let y 7 £ X be such that f{y 1 ) < /* + 7. Therefore, for /c > fc 7 , we have 

/ m 

E[/Ofc)] - fM > 7 + I I £ Ci 



/ m \ 2 



Fix y = y 7 in (|3.20|) and in a manner identical to the proof of Theorem 13.41 we can 
obtain a contradiction. 

To prove the relation in (|3.18[) . we use a line of analysis similar to that of the 
proof of Theorem 13. 41 where we define the set 



L N = { x €X : f(x) < f* + ^ + % (f^Ci+mv) 



N 2 . 

\i=l 



We let yN £ Ln be such that 

f(yN)<r + ^, 

and consider the sequence x k defined as follows: 



Xk+l 



x k+1 if x k <£ L N , 
y N if x k £ L N . 
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As in the proof of Theorem 13.41 we can show that the sequence {xk} enters the set 
ijv, for any TV. We then obtain the result by taking the limit TV — > oo. □ 

In the absence of errors (y = 0), the error bound of Theorem 13.51 reduces to 



which coincides with the error bound for the cyclic incremental subgradient method 
(without errors) established in [20], Proposition 2.1. 

4. Markov Randomized Incremental Subgradient Method. While the 
method of Section [3] is implcmcntable in networks with a ring structure (the agents 
form a cycle), the method of this section is implcmcntable in networks with an arbi- 
trary connectivity structure. The idea is to implement the incremental algorithm by 
allowing agents to communicate only with their neighbors. In particular, suppose at 
time k, an agent j updates and generates the estimate xj~- Then, agent j may pass 
this estimate to his neighboring agent i with probability [P(k))ij. If agent j is not a 
neighbor of i, then this probability is 0. Formally, the update rule for this method is 
given by 



where xq G X is some random initial vector, e s (k+i),k+i is a random noise vector and 
afe+i > is the step-size. The sequence of indices of agents updating in time evolves 
according to a a time non- homogeneous Markov chain with states 1, . . . , m. We let 
P{k) denote the transition matrix of this chain at time k, i.e., 

[P{k)]i,j = Prob {s(k + 1) = j | s(k) = i} for all i,j G {1, ...,m}. 

In the absence of subgradient errors (£ s (k),k = 0), when the probabilities [P(k)] it j 
are all equal, i.e., [P(k)]ij = ^ for all i,j and all fc, the method reduces to the 
incremental method with randomization proposed in [20] , which is applicable only to 
the agent networks that are fully connected. 

We note here that the time non-homogeneous Markov chain models networks 
where the set of neighbors of an agent may change in time (as the network may 
be mobile or for other reasons). We will also assume that the agents decide the 
probabilities with which they communicate with their neighbors, i.e., at time k, the 
agent j chooses the probabilities [P{k))ij > for his neighbors i. 

The main difficulty in the analysis of the method in (|4.ip comes from the depen- 
dence between the random agent index s(k + 1) and the iterate x^. Assuming that the 
Markov chain is crgodic with the uniform steady-state distribution, in the absence of 
the errors (i.e., = 0), it is intuitively possible that the method uses directions 
that approximate the subgradient — X}2=i ^fi( x k) m the steady state. This is the 
basic underlying idea that we exploit in our analysis. 

For this idea to work, it is crucial not only that the Markov chain probabilities 
converge to a uniform distribution but also that the convergence rate estimate is 
available in an explicit form. The uniform steady state requirement is natural since 
it corresponds to each agent updating his objective fi with the same steady state 
frequency, thus ensuring that the agents cooperatively minimize the overall network 
objective function f{x) = Y^iLi fi( x )^ an d n °t a weighted sum. We use the rate 





(4.1) 
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estimate of the convergence of the products P(£) ■ ■ ■ P(k) to determine the step-size 
choices that guarantee the convergence of the method in JO}. 

To ensure the desired limiting behavior of the Markov chain probabilities, we use 
the following two assumptions on the matrices [P(fc)]. 

Assumption 4. Let V = {1, . . . , m}. Let E(k) be the set of edges (j, i) induced 
by the positive entries of the probability matrix P(k), i.e., 

E(k) = {(i,j) | [P(k)] itj > 0}. 
There exists an integer Q > 1 such that the graph 

{y^itk^Eil)) is strongly con- 
nected for all k. 

Generally speaking, Assumption [4] ensures that each agent has a "chance" to 
update the estimate once within a finite time interval. It would guarantee that 
each agent updates the estimate Xk with the same frequency in a long run. This is 
ensured by the following assumption. 

Assumption 5. 

(a) The diagonal entries of P(k) are all positive for each k. 

(b) All positive entries of[P(k)] are uniformly bounded away from zero, i.e., there 
exists a scalar 77 > such that for all i, j G {1, . . . , to} and all k, 

if [P(k)]i,j > 0, then [P{k)] itj > ?]. 

(c) The matrix P(k) is doubly stochastic for each k, i.e., the sum of the entries 
in every row and every column is equal to 1. 

Assumptions [5ja) and (b) ensure that the information from each and every agent 
is persistent in time. Assumption EJc), ensures that the limiting Markov chain proba- 
bility distribution (if one exists) is uniform. Assumptions |4] and [5] together guarantee 
the existence of the uniform limiting distribution, as shown in [25]. We state this 
result in the next section. 

Note that the cyclic incremental algorithm (|2.2j) does not satisfy Assumption [5l 
The transition probability matrix corresponding to the cyclic incremental method is 
a permutation matrix with the (i, i)-th entry being zero when agent i updates at time 
k. Thus, Assumption [5jc) is violated. 

We now provide some examples of transition matrices [-P(£;)] satisfying Assump- 
tion The second and third examples are variations of the Metropolis-Hasting 
weights [8,39], defined in terms of the agent neighbors. We let Ni(k) C {1, . . . , to} be 
the set of neighbors of an agent i at time k, and let |iVj(fc)| be the cardinality of this 
set. Consider the following rules: 

• Equal probability scheme. The probabilities that agent i uses at time k are 

I m 

[_ otherwise. 

• Min- equal neighbor scheme. The probabilities that agent i uses at time k are 

' min { |jv,(fc) | +i ' | w j( fc)|+i } ifj^iandjGJVi(fc), 



[P(k)]i* 



1 J2jeN t (k) mm { |jv;(fc)|+i' |JVj(fc)|+i } tfj— *j 
k otherwise. 
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Weighted Metropolis-Hastings scheme. The probabilities that agent i uses at 
time k are given by 



rji mm 



(fc)l' JVj 



[P{k)]ij = 



w>] } 

111 { \N^k)\ ' " 



iij^i and j e Ni(k), 



if j = i, 
otherwise, 



where the scalar rji > is known only to agent i. 
In the first example, the parameter rj can be defined as r\ 
example, rj can be defined as 



In the second 



V 



mm 



1 



1 



|JVi(A)| + l' |^(fc)| 
while in the third example, it can be defined as 

r i 

r\ = mmp]i, 1 — m) mm 



iNm' \N 3 (k)\ 

Furthermore, note that in the first example, each agent knows the size of the 
network and no additional coordination with the other agents is needed. In the other 
two examples, an agent must be aware of the number of the neighbors each of his 
neighbors has at any time. 

4.1. Preliminaries. We first state a result from [23] for future reference. The re- 
sult captures the convergence and the rate of convergence of the time non-homogeneous 
Markov chain to its steady state. Define $(fc, I), with k > I, to be the transition prob- 
ability matrix for the Markov chain from time I to k, i.e. $(fc, i) — P(£) ■ ■ ■ P(k) with 
k > I. Then, we have the following convergence result for the transition matrices. 

Lemma 4.1. Assume the matrices P(k) satisfy Assumptions^ and\^ Then: 

1. limfe_ >o0 $(fc, s) = ^ee T for all s. 

2. The convergence is geometric and the rate of convergence is given by 



[*(M)k;- 



i 



< b(3 



k-l 



where 



6=1- 



for all k and I with k > I > 0, 



and (3 = ( 1 — 



4m 2 / V Am 2 

We use the estimate of Lemma 14.11 to establish a key relation in Lemma 
which is repeatedly invoked in our subsequent analysis. The idea behind Lemma 
is the observation that when there are no errors (e s (k),k = 0) and the Markov chain 
has a uniform steady state distribution, the directions V/ s (fc+i)(xfe) used in (|4.1|) are 
approximate subgradients of the function ^ Y^iLi ^fi( x ) a t points x n nA far away 
from Xk in the past [i.e., k » n(k)]. However, even though x n (k) are far away from 
Xk in time, their Euclidean distance \\xjc — x n (k)\\ can be small when the step-size is 
selected appropriately Overall, this means that each iterate of method in (|4.ip can 
be viewed as an approximation of the iteration 



Xk+l 



Px 



Xk 
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with correlated errors depending on current and past iterates. 

In the forthcoming lemma and thereafter, we let G k denote the entire history 
of the method up to time k, i.e., the cr-field generated by the initial vector xq and 
{s(n),e a ( n ) jTl ;0 < n < k) . 

Lemma 4.2. Let AssumptionsUW^hold. Then, the iterates generated by algorithm 
ar £ such that for any step-size rule, for any y £ X, and any non-negative integer 
sequence {n(k)}, n(k) < k, we have 

E[||4+i(2/)|| 2 I G n{k) ] <E[||4(y)|| 2 I G n{k) ] - (f(x n(k) )-f(y)) 

+ 26 (jt c ^J « fe +i/3 fc+1 -" (fe) K(fc)(y)|| 
fe-i 

+ 2Ca k+1 ^2 cti + i (C + i/j+i) 

l=n{k) 

+ 2a k+l n k+1 E\\\d k {y)\\ \ G n(k) ] + a 2 k+1 (u k + C) 2 , 

where d k (y) — x k — y and C — maxi<K m C;. 

Proof. Using the iterate update rule in (|4.1[) and the non-expansive property of 
the Euclidean projection, wc obtain for any y € X and k > 0, 

\\ d k+i{y)\\ 2 = \\V X [xk ~ Qt+iV/ s (Hi) (^fe) - "fe+ie s (fe+i),fe+i] - y\\ 

II 1 1 2 

< \\x k - a k+1 Vf s ( k+1) (x k ) - a k+1 e s ( k+1 ) tk+1 - y\\ 
= \\ d k(y)\\ 2 - 2ct k+1 d k (y) T Vf s{k+1) (x k ) 

- 2a k+ id k (y) T e s ( k+1 ^ k+1 + a 2 k+1 ||e s ( fe+1 ) ifc+1 + V/ s(fe+1) (a^)!) 2 . 

Using the subgradient inequality in (|2.4[) to bound 4(y) T V/ s (fc +1 ) (x k ) , we get 

||4+i(y)|| 2 <||4(y)|| 2 - 2a k+1 (f s{k+1) (x k ) - / s(fc+ i)(j/)) 

- 2a fc+ id fe (y) T e s (A:+i),fc+i + a l+i \\ e s{k+i),k+i + V/ s (fe+i) (#fc)|| 
HMfc(?/)|| 2 ~ 2a fc+i (/ S (fe+i) (I*) ~ / s (fc+i) ( x n(k))) 

~ 2a fc+1 (/ s ( fc+ i) (x n{k) ) - f s ( k +i){y)) - 2a fe +i4(2/) T e s (fc+i),fc+i 
+ a t+i \\ e s(k+i) : k+i + V/ s (fc + i) (a; fc )|| 2 . 

Taking conditional expectations with respect to the cr-field G n ( k ) , we obtain 

E[\\d k+1 (y)f \G n[k) ] <E[\\d k (y)\\ 2 \ G n{k) ] 

- 2a fc +i (E[/ S ( fe+1 ) (x k ) - / s (fc+i) (^(fc)) I G„( fc )]) 

- 2a fe+ i (E[/ s(fe+1) (.T„ (fe )) - / s (fc + i) (y) | G n ( fc) ]) 
-2a fc+1 E[d fe ( 2 /) 1 e s (fc+i),fc+i I G„( fc )] 

+ al +1 E \\e s (k+i)M+i + V/ s (fe+i) (a;fe)|| 2 | Gn(fc) • (4-2) 



We next use the subgradient inequality in (|2.4[) to estimate the second term in the 
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preceding relation. 

E[/ s (fc+l) (Xk) — f s (k+l) { x n(k)) I G n (k)] >E[V/ S(fc + 1 ) (%n(k)) (%n(k) ~ x k) | G n ( fc )j 

- ~~ E [||V/ s (fc + l) ( x 'n(fc))|| \\ x n(k) - Xk\\ I Gn(fc)] 

>-CE[\\x n{k) -x k \\\G n(k) ]. (4.3) 

In the last step we have used the subgradient boundedness from Assumption [3] to 
bound the subgradient norms || Vf s (k+i) ( x n(k)) || by C — maxi<,< m Gj. We estimate 
E[ UCn(fc) — x k\\ I G n (k)] from the iterate update rule (|4.ip and the non-expansive prop- 
erty of the Euclidean projection as follows: 





»(fe) - 




^Tl(fc)] 




fc-1 






< 


E 


E[ 


X;+i - xj | | G„( fc )] 




l=n{k) 








fc-1 






< 


E 




-iE[ ||V/ s(£+ i) 




l=n{k) 








fc-1 






< 


E 




-iE[||V/. (/+1) (a:,)!! 




l=n{k) 








fc-1 






< 


E 


Q'/+l (C + , 




l=n(k) 







+ || e s(«+l),i+l|| I ^n(fc)] 



where in the last step we have used the boundedness of subgradients and of the second 
moments of e,^ [cf. Eq. (|3.2[) ]. From the preceding relation and Eq. (|4.3|) . we obtain 



fc-i 

E [/s(fc+l) (Xk) ~ /s(fc+l) (x n {k)) I G n(k)] >~C 2_j ai + 1 ( C + ' 

l=n(k) 

By substituting the preceding estimate in (|4.2[) . we further obtain 

fc-i 

E[||d fe+1 (y)|| 2 | G n(k) ] <E[\\d k (y)\\ 2 \ G n{k} ] +2Ca k+1 £ a l+l {C+v l+l ) 

l=n{k) 

- 2a fe+ i (E[/ s(fe+1) (x n{k) ) - / g(fc+ i) (y) | G„ (fe) ]) 

- 2afc + iE[d fc (y) T e ;! ( fe+1 ) :fc+1 | G n (jfe)] 



afc+i E 



e s (fc+i),fc+i + V/ s (fc + i) (a:ife)|| | G„ 



(fc) 



(4.4) 



We estimate the last term in (|4.4[) by using the subgradient boundedness of Assump- 
tion [3] and the boundedness of the second moments of e^fc [cf. Eq. (|3.2|) ]. as follows: 



- s (k+\),k+i + V/ s (fc+i) {xh)\\ I G n (fe) 



< E 



s(fc+i),fc+i|| + ||V/ g (fc + i) {xk)\\ + 2 ||e s (A;+i),fc+i|| ||V/ s (fc + i) (xk)\\ | G n (jt) 



< ^ + G 2 + 2j/ fc G 
= K + G) 2 . 
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Substituting the preceding estimate in Eq. (I4.4[) . we have 



fc-i 



E[||d fe+1 (y)|| 2 | G n(k) ] <E[||4(2/)|| 2 I G n(k] ]+2Ca k+1 ]T a l+1 {C + n+i) 

l=n{k) 

- 2a k+1 (E[/ g ( fc+1 ) (x n{k) ) - f s ( k+ i) (y) | G„ (fe) ]) 

- 2a k+1 E[d k (y) T e s ( k+ i). k+ i | G n(fe )] 

+ ai +l {y k + C) 2 . (4.5) 

We next estimate the term E [d k (y) T e s ( k+1 ) tk+1 | G n ( fe )] . Since G n(fe ) C G k and d k (y) 
is Gfe-measurable 

E[rffc(y) T e s (fc+i),A:+i I G n (fe)] =E[E[d fe (y) T e s ( fe+1 ) !fc+1 | G&] | G„( fc )] 

=E[d fc (y) T E[e s(fc+1)jfc+1 | G fc ] | G n(fc) ] 
>-E[||d fc (y)||||E[e fl(fc+1) , fe+1 |G fe ]|| |G n(fc) ] 
> - ^ k+1 E[\\d k (y)\\ | G n(fe) ] , 

where the first equality follows from the law of iterated conditioning. Using the 
preceding estimate in (|4.5[) . we obtain 

k-l 

E[|K + i(y)|| 2 |G n(fe) ] <E[\\d k (y)\\ 2 \G n{k) ]+2Ca k+1 J2 <*i+i (C + 

l=n(k) 

- 2a k+1 (E[/ s(fc+1) (x n{ k)) - f s (k+i) (y) I G„( fe) ]) 

+ 2a k+lh t k+1 E[\\d k (y)\\ \ G n(k) ] + a 2 k+1 {v k + C) 2 . (4.6) 

Finally, we consider the term E[/ s(fc+1) (a; n(fe) ) - f s ( k +i) (y) I G n (k)], an d we use 
the fact that the probability transition matrix for the Markov chain {s(k)} from time 
n(k) to time k is $(fc + 1, n(k)) = P(n(k)) ■ --P{k). We have 

E[/.,(fe+l) ( x n(k)) - fs(k+l) (y) I G„( fc )] 

771 



m 1 m 

>E-(M^«) -/<(?/)) -E 



m 

i=l 



[$(fc + l, n (fe))] s(T[(fc)) .-I 



> - (/ (*„(*)) - /(»)) - £ |/ ?; (a: n(fc) ) - My)\ , (4.7) 



where at the last step we have used Lemma I4.ll Using the subgradient inequality 
(I2.4I). we further have 



\fi (xn(k)) ~ fi(y)\ <Ci ||x n (fe) ~ y\\ = Gi || d„(fe)(y) || . (4.8) 
The result now follows by combining the relations in Eqs. (|4.6p . (|4. T[) and (|4.8|) . □ 



4.2. Convergence for diminishing step-size. In this section, we establish the 
convergence of the Markov randomized method in (|4.1[) for a diminishing step-size. 
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Recall that in Theorem 13.31 for the cyclic incremental method, we showed an almost 
sure convergence result for a diminishing step-size a k subject to some conditions that 
coordinate the choice of the step-size, and the bounds fi k and v k on the moments of 
the errors e^fc. To obtain an analogous result for the Markov randomized method, we 
use boundedness of the set X and more restricted step-size. In particular, we consider 
a step-size of the form a k — -m for a range of values of p, as seen in the following. 

Theorem 4.3. Let Assumptions^ [H [J] and\5\hold. Assume that the step-size is 
ctk = j^, where a and p are positive scalars with | < p < 1. In addition, assume that 
the bounds fj,k and Vk on the error moments satisfy 



^a /£ /jfe<oo, v = sup v k < oo 



fc=i 



fc>i 



Furthermore, let the set X be bounded. Then, with probability 1, we have 



lim iaff(xk) 

k—>oo 



r 



liminf dist(a; fc ,X*) = 0. 

k — >oo 



Proof. Since the set X is compact and fi is convex over 5ft™, it follows that the 
subgradients of fi are bounded over X for each i. Thus, Assumption [3] is satisfied, 
and we can use Lemma 14.21 

Since X is compact and / is convex over 5ft™ (therefore, also continuous), the 
optimal set X* is nonempty, closed and convex. Let x* k be the projection of x k on the 
set X*. In Lemma B~2l we let y = x* k and let n(k) — k + 1 — |~fc 7 ] , where 7 > (to be 
specified more precisely later on). Note that n(k) < k for all k > 1. Using this and 
the relation dist (xk+i,X*) < \\xk+i — x k \\, from Lemma l4~2l we obtain for all k > id, 



d\st(x k+1 ,X*) \G n(k) 



<E 



dist (x k ,X* 



G n (k) 



2a fe +i 



(/(*„(*))-/*) 



+ 26 [J2 C *j \\d n (kM)\\ 

+ 2Ca k +ia n , k)+1 {\k 1 ] - 2) max (C + Ui +1 ) 

n(k)<l<k 

+ 2a k+1 fi k+1 E[d\st{x k ,X*) \ G n{k) ] + a 2 k+1 {v k + Cf. 
Taking expectations and using sup fc>1 v k — v, we obtain for all k > 1, 

2a feH 



E 

where 



dist (x k+ i,X*) 



< E 



dist {x k ,X*Y 



- (E[/ (x n (fc))] - /*) + r fe+l, 



r k+1 =26 [J2 C i) "fc+i/ 3 ^ 1 \\dn(k)(xt)\ 



+ 2C(C + v)a k+1 a n(k)+1 (\W] - 2) 

+ 2a fe+1 /i i;+1 E[dist (x k ,X*)} + a 2 k+1 (v k + C) 2 . 

We next show that Y^/k=2 T k+i < °°- Since a k = A, we have ctfe+i < a k for all 
k > 1. Furthermore, since j3 < 1, we have /3T fc7 l < /3 fc7 . Therefore, a k +\ 



3 The equivalent expression for the case when k = 1 is obtain by setting the fourth term to 0. 
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By choosing 7 > such that 7 > 1 — p, we see that < -rr=^ for all fc > 1. Hence, 
for all k > 1, 



; — o ; — o 1 — o •'J- & 



roo 



dy 



a/3 



Since the set X is bounded, it follows that 



J2^b \J2 C * )u k +i(3^ \\d n{k) {x* k )\\ <oo. 



(4.9) 



fc=2 \i=l 



Next, since [fc 7 ] — 2 < fc 7 for all fc > 2, and since a^+i < a*,, a k = and 
n(fe) = k + 1— [fc 7 ] , it follows that for all fc > 2, 



afc+ian(fc)+i(rfc 7 l - 2) < 



a 2 fc 7 



< 



a 2 fc 7 



a 2 fc 7 



fcf(fc + 2- [fc 7 ])P fcP(fc-fc 7 )P fc 2 P(l - fc 7 - 1 )^' 



By choosing 7 > such that it also satisfies 7 < 2p — 1 (in addition to 7 > 1 — p) , we 
have 7 < 1 (in view of p < 1). Therefore, for all fc > 2, 



fc" 



< 



1 



1 



fc 2 P(l - fc 7 - 1 )? ~ (1 - 2 7 -!)p fc 2 ?- 7 ' 
By combining the preceding two relations, we have 



00 2 °° 

^C{C + u)a k+1 a n(k)+1 {\k^ -2) <2C(C + ^)— |_ T ^ ]T ^_ < 00, (4.10) 



(1 

where the finiteness of the last sum follows from 2p — 7 > 1 . 
Finally, as a consequence of our assumptions, we also have 



^22a k+1 fi k+1 E[d\st{x k ,X*)} < 00, 

fc=2 

00 

I]a'+iK + C) 2 < 00. 



fc=2 

Thus, from Eqs. (|4.9[) and (|4.10[) . and the preceding two relations, we see that 

k =1 T k+1 < OO. 

From the deterministic analog of Lemma 13.21 we conclude that E J 



converges to a non-negative scalar and 

2a k 



dist {x k ,x*y 



fc=2 

Since p < 1, we have X)fc°=2 a fc+i = 00 ■ Further, since / (aWfc)) > /*, it follows that 

HminfE[/(z n(fe) )] = /*. (4.11) 

k — ►oo 

The function / is convex over Jf n and, hence, continuous. Since the set X is bounded, 
the function f(x) is also bounded on X. Therefore, from Fatou's lemma it follows that 



liminf f (x k ) 
k — * 00 



< liminf E[/ (x k )] = r, 

k—>oo 
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implying that liminf fc^oo / (x k ) = f* with probability 1. Moreover, from this relation, 
by the continuity of / and boundedness of X, it follows that liminf fc^oo dist (x k , X*) = 
with probability 1. □ 



As seen in the proof of Thcorcm l4.3[ E 



dist (x k ,x*y 



converges to a nonnegative 



scalar, but we have no guarantee that its limit is zero. However, this can be shown, 
for example, for a function with a sharp set of minima, i.e., / satisfying 



f(x)-t>(d\st(x,X*) 



for all x G X, 



for some positive scalars £ and £. Under the assumptions of Theorem 14. 3[ we have 
that liminffe_» 00 E[f (xk)] = f* [cf. Eq. (|4.11|) ] and therefore, 



= liminf E[/(x fc ) - /*] > C liminf E dist (x k , X*^ 

k — >oo k — >-oo L 



> 0. 



converges, it has 



Hence, liminf/ £ _ K>0 E dist (x k , X*y = 0, and since E dist (xk, X*Y 
to converge to 0. 

4.3. Error bounds for constant step-size. We now establish error bounds 
when the Markov randomized incremental method is used with a constant step-size. 

Theorem 4.4. Let Assumptions^ [H [7] and\5\ hold. Let the sequence {xk} be 
generated by the method l\ l with a constant step-size rule, i.e., a k = a for all k. 
Also, assume that the set X is bounded, and 



fx = sup /ife < oo, 
fc>i 



V = SUp/^fc < oo. 

k>l 



Then for any integer T > 0, 



liminf E[/(.T fe )] <f* + fi max \\x - y\\ + -a(v + Cf + aTC (C + v) 

k x,y£X 2 



b[ VC, )8 T+1 max \\x-y\\, 



(4.12) 



where 8 = (1 
same estimate holds for inf k f{x k ). 



— and C = maxi<j< m Cj. Furthermore, with probability 1, the 



Proof. Since X is compact and each is convex over K ra , the subgradicnts of fi 
are bounded over X for each i. Thus, all the assumptions of Lemma 22] are satisfied. 
Let T be a nonnegative integer and let n(k) = k — T. Since fx k < fi and v k < v for all 
k, and |jdfc(y)| < max 2 , jy6 x \\x — y\\, according to Lemma l4~2l we have for y = x* G X* 
and k > T, 



E[||d fc+1 (x*)|| 2 | G n(k) ] <E[||4(^)|| 2 I G n(k) ] 



2a 



{f{x k - T )-n 



2b VcJ a(3 T+1 max \\x - y\\ 

Vtt / *> veX 
2a 2 TC{C + v) 

2a/j, max ||x — y|| + a 2 (i> + C) 2 . 



(4.13) 
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By taking the total expectation, we obtain for all x* £ X* and all k >T, 
E[R+i(x*)|| 2 ] <E[\\d k (x*)\\ 2 ]- — (E[f(x k _ T )]-t) 

+ 26 VC, a(3 T+1 max ||a; - y\\ 

\~[ J x ' y£X 
+ 2a 2 TC{C + v) 

+ 2an max ||x - y\\ + a 2 (v + C) 2 . 

x,y£X 

Now assume that the relation (|4.12|) does not hold. Then, there will exist a 7 > 
and an index k 7 > T such that for all k > fc 7 , 

E[/K)] >/* + 7 + H max ||x - y\\ + \a(v + Cf + aTC (C + u) 



+ b VC, U T+1 max \\x - y\\. 
\tt J x ' yeX 

Therefore, for k > fc 7 + T, we have 

E[||d fc+1 (x*)|| 2 ] <E[\\d k (x*)\\ 2 ] - 2a 7 < • ■ • < E[||^(z*)|| 2 ] - 2aj(k - fc 7 ). 

For sufficiently large fc, the right hand side of the preceding relation is negative, 
yielding a contradiction. Thus, the relation (|4.12[) must hold for all T > 0. 
We next show that for any T > 0, 

inf f{x k ) </* + n max ||a; - y\\ + \a{v + C) 2 + aTC (C + v) 

k x,y£X 2 

+ b fc, )(3 T+1 max \\x-y\\, (4.14) 
\tl J x ' yeX 

with probability 1. Define the set 

L N = \x £ X : f(x) < f* + -J- + fx max ||a; - y\\ + \a{v + C) 2 + aTC (C + v) 

I JV x,i;£X 2 



+b VCi /3 T+1 max ||x-y|| 
Let a;* £ X* and define the sequence as follows: 



x fc+ i if x k L N , 
x* otherwise. 



Thus, the process {x k } is identical to the process {x k } until {x k } enters the set Ljy. 
Define 

d k (y) = x k - y for any y £ X. 

Let k > T. Consider the case when x k £ Lj\r. Then, x k = x* and x k +i = x* , so that 
d k (x*) = and d k +i{x*) = 0, yielding 

||4+i(z*)|| 2 I G n(k) ] = e[||4(^)|| 2 I G n{k) ] . (4.15) 
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Consider now the case when x k Ljy. Then, xi = xi and x\ £ L?j for all I < k + 1. 
Therefore, by the definition of the set Ljy, we have 



f(xk- T ) - f* + M max \\x -y\\ + \a{u + C) 2 + aTC (C + v) 

iV x,y&X Z 

+ b (ycA (3 T+1 max \\x - y\\. 
\fi J x ' yeX 

By using relations (|4. 1 3[) and (|4.16p , we conclude that for x k ^ Ijv, 



ife+i 



*M|2 



G n 



(fc) 



< E 



\d k (x*)\\ 2 | G n[k) 



Therefore, from (|4.15p and (14. 1T[) . we can write 



*M|2 



< E 



\\d k (x*)\\ 2 | G n{k) 



2a 
~N' 



where 



(4.16) 



(4.17) 



(4.18) 



if Xk G L N , 
if Xk £ L N . 



N 



Observe that (|4. 18|) satisfies the conditions of Lemma [5T2l with uu- = E ||dfc(x*)|| 2 | G n ^) 
Tk = G n n.\, <7fc = 0, Wk = 2Afe_|_i and Vk = 0. Thus, it follows that with probability 1, 

oo 

A fc+ i < oo. 

k=T 

However, this is possible only if = for all k sufficiently large. Therefore, with 
probability 1, we have x k 6 Ln for all sufficiently large k. By letting N — > oo, we 
obtain (j4T4|) . □ 

Under Assumptions of Theorem 14. 4\ the function / is bounded over the set X, 
and by Fatou's lemma, we have 



<liminf E[f{x k )} 

k — >oo 



lim inf f(x k ) 

k — >oo 



It follows that the estimate of Theorem 14.41 also holds for Efliminffc^oo f(x k )] ■ 

In the absence of errors (fi k = and v k =0), the error bound in Theorem 
reduces to 



/* + 



- aC 2 + aTC 2 + b ( V C t | (3 T+1 max 

2 ^ ; 



p - y\ 



(4.19) 



With respect to the parameter (3, the error bound is obviously smallest when (3 = 0. 
This corresponds to uniform transition matrices P(k), i.e., P{k) = ^ee T for all k (see 
Lemma |4.1[) . As mentioned, the Markov randomized method with uniform transition 
probability matrices P(k) reduces to the incremental method with randomization 
in [20]. In this case, choosing T = in (|4.19|) is optimal and the resulting bound is 
/* + §C 2 , with C = maxi<i< m Cj. We note that this bound is better by a factor 
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of m than the corresponding bound for the incremental method with randomization 
given in Proposition 3.1 in [20]. 

When transition matrices are non-uniform (J3 > 0), and good estimates of the 
bounds C, on subgradient norms and the diameter of the set X are available, one 
may optimize the error bound in (|4.19|) with respect to integer T for T > 0. In 
particular, one may optimize the term aTC 2 + b (52^Li Ci) P T+1 max I;!)e x ||z — y|| 
over integers T > 0. It can be seen that the optimal integer T* is given by 



when 



aC* 



(ln/3) _1 In 



Co (-In /8) 



Co (-In (9) 

1 when 



Co(- In ,3) 



> 1, 
< 1, 



(4.20) 



where C = 6 Ci) max^^x ||a; - y\\. 

A similar expression for optimal T* in the presence of subgradient errors can be 
obtained, but it is rather cumbersome. Furthermore such an expression (as well as the 
preceding one) may not be of practical importance when the bounds C, , the diameter 
of the set X , and the bounds fx and v on the error moments are "roughly" known. In 
this case, a simpler bound can be obtained by just comparing the values a and /3, as 
given in the following. 

Corollary 4.5. Let the conditions of Theorem \4-4\ hold. Then, 



IhninfE[/(s fc )] </* 

k — >oo 



[i max \\x 



,1=1 



max \\x 



whe 



ln(q) 
ln(/3) 



ifa>/3, 
1 if a < [3. 



Furthermore, with probability 1, the same estimate holds for inffc /(xfc). 

Proof. When a > f3 choose T = 0. In this case, from (Theorem 14. 4p we get 



E[/(x fc )] </* + fi max ||x - y|| + a -(i/ + C) 2 + 6 

x,yeX \ z 




max ||a: -2/H 



When a < /? we can choose T 



ln(q) 



1. Then, from (Theorem EL"3|). 



E[/(x fe )] >/* + m max ||as - j/| 



+ a 



I^ + C) 2 + C(C + *,) 



ln(a) 
H3) 



max x - j/| 



□ 

It can be seen that the error bounds in (|4.20[1 and Corollary [43] converge to zero 
as a — > 0. This is not surprising in view of the convergence of the method with a 
diminishing step-size. 

As discussed earlier, the error bound in [9] is obtained assuming that there are 
no errors in subgradient evaluations and that the sequence of computing agents form 
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a homogeneous Markov chain. Here, while we relax these assumptions, we make the 
additional assumption that the set X is bounded. 

A direct comparison between the bound in Corollary (|4.5[) and the results in [9] is 
not possible. However, some qualitative comparisons on the nature of the bounds can 
be made. The bound in [9] is obtained for each individual agent's sequence of iterates 
(by sampling the iterates). This is a stronger result than our results in (|4.20[) and 
Corollary 14.51 which provide guarantees only on the entire iterate sequence (and not 
on the sequence of iterates at an individual agent). However, the bound in [9] depends 
on the entire network topology, through the probability transition matrix P of the 
Markov chain. Thus, the bound can be evaluated only when the complete network 
topology is available. In contrast, our bounds given in (|4.20[) and Corollary 14.51 can 
be evaluated without knowing the network topology. We require that the topology 
satisfies a connectivity assumption, as specified by Assumption [4J but we do not 
assume the knowledge of the exact network topology. 

5. Discussion. Incremental algorithms form the middle ground between selfish 
agent behavior and complete network cooperation. Each agent can be viewed to be 
selfish, as it adjusts the iterate only using its own cost function. At the same time, 
the agents also cooperate by passing the iterate to a neighbor so that he may factor 
in his opinion by adjusting the iterate using his cost function. Through Theorems 13. 31 
and 14.31 it was observed that a system level global optimum could still be obtained 
through some amount of cooperation. This can be construed as a statement of Adam 
Smith's invisible hand hypothesis in a more semi-cooperative market setting. 

The results we have obtained are asymptotic in nature. The key step in deal- 
ing with both the incremental algorithms was to obtain the basic iterate equation 
(Lemmas 13.11 and I4.2|) . This was then combined with standard stochastic analysis 
techniques to obtain asymptotic results. While we have restricted ourselves to estab- 
lishing only convergence results, it is possible to combine the techniques in [19] with 
the basic iterate relation to obtain bounds on the expected rate of convergence of the 
algorithms. Finally, we have only listed a few possible applications for the results in 
this paper. The problem of aligning and coordinating mobile agents can also be cast 
in the optimization framework studied in this paper and the results in this paper, 
especially the results on Markov stochastic sub-gradient algorithm, can be used to 
design suitable alignment algorithms. 
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